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q Abstract 

A statistical learning/inference framework for color demosaicing is presented. We start with simplistic 
[T i assumptions about color constancy, and recast color demosaicing as a blind linear inverse problem: color 

parameterizes the unknown kernel, while brightness takes on the role of a latent variable. An expectation- 
maximization algorithm naturally suggests itself for the estimation of them both. Then, as we gradually 
broaden the family of hypothesis where color is learned, we let our demosaicing behave adaptively, in 

u 

a manner that reflects our prior knowledge about the statistics of color images. We show that we can 

o 



incorporate realistic, learned priors without essentially changing the complexity of the simple expectation- 
maximization algorithm we started with. 
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I. Introduction 

O 

t> Color demosaicing purports to regenerate a full color image from data collected by a single array of 

photocells each of whom is sensitive to only one out of three color separations {red/ green/blue). Such an 
objective may be attained under the premise that the information contained in the pictured scene doesn't 
exceed the capacity of the acquisition channel (T), i-e. of strong statistical dependencies among the three 
color separations. Understanding such dependencies, and properly exploiting them, is the key to success 
in the demosaicing endeavor. 
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Most evident is the fact that, away from object boundaries, chrominancdj seems to vary only over 
rather long scales. All modern demosaicing algorithms exploit this, though, usually, in a rather heuristic 
fashion. For instance, smooth hue transition [2] uses a bilinear interpolation to estimate, first, the green 
channel, and then the red-to-green and blue-to-green ratios. Interpolated red-to-green and blue-to-green 
differences, instead of ratios, have been used in [3 ]. Ellaborate versions of these ideas have been proposed 
in H, |0, 0, EL flU, 13 and flTU]] among others. Typically, all these algorithms have been tested on 
scanned analogical pictures (for a survey see [11]), for which the true values of the three color separations 
are known. It is far from obvious, though, why the heuristics that performs optimally on such tests would 
still do so on digitally captured data, whose statistics are likely different More importantly, performance 
on the very same data set on which an algorithm has been trained may have been achieved at the expense 
of the algorithm's ability to generalize, i.e. of performing well on so far unseen data lTT2l . 

Bayesian theory provides a framework for the integration of learned statistical models into robust 
estimates. The first approach of this kind to color demosaicing was proposed by Brainard lfT~3Tl iTPfll . 
who cast the problem as a statistical decision one: given the responses of interleaved subarrays of pixels 
and a model of the statistical distribution of color images, what image estimate minimizes the expected 
reconstruction error?. This question, borrowed from his paper ||T3l . highlights the two main difficulties a 
Bayesian approach must overcome: a) learning a prior model for the statistical distribution of color images; 
b) figuring out an inference algorithm that leverages this knowledge, together with the observed data, 
into optimal estimates. Somehow implicit here is an agreed upon meaning for what is to be understood 
as optimality, i.e. for how reconstruction errors are to be measured. The standard choice in Bayesian 
theory is mean squared error (MSE), and the corresponding estimator have come to be known as Bayes 
estimate. To be clear, a Bayes estimate guarantees that whatever prior knowledge we may have is put 
to optimal use, in a MSE sense. Obviously, its performance will still be limited by the accuracy of the 
model it exploits, as priors are liable to bias the estimates. 

The simplest choice of prior, beside the obvious non-informative one, is a multivariate normal. This was 
the choice Brainad made in his pioneering work |[T3l . It has the huge advantage that a vast mathematical 

'that is, the point-wise ratio between color separations. We will often refer to it just as color 

2 it may even change among digital cameras. Some of them position anti-aliasing filters in front of the sensors; others don't. 
Even more, the spectral response of the sensors may also vary among cameras. The spectral content of the illuminant is also 
subject to change 
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apparatus becomes inmmediately available both for learning and inference lff5l lPT6l . In addition, the 
resulting Bayes estimate is linear iTTTl . and hence of low computational complexity. Its main shortcoming, 
on the other hand, is that stationary gaussian distributions cannot capture the presence of singularities, 
like edges, that are ubiquitous in natural images. 

Modeling the statistics of images beyond second-order is a tremendous challenge, given the large 
number of degrees of freedom involved. The task is somehow made easier by the consideration of 
some symmetry constraints: it may be argued that images, as stochastic processes, are spatially uniform, 
isotropic, scale-invariant, and local. This last property is captured by the so called Markov random fields 
(MRF), and, for this reason, they have become essential tools in modeling the prior statistics of natural 
images. Regarding gray-scale ones, efforts to model their statistics seem to have coalesced in the field 
of experts framework proposed by Roth et al lfT8l : a model built upon an overcomplete set of spatially- 
localized, band-pass filters with t-student distributions. Efforts to learn a corresponding set of filters for 
color images, and exploit them for color demosaicing, have been reported in [19]. Hel-Or have built a 
gaussian prior based on a steerable wavelet representation ll20ll . 

Inference with MRF is not an easy task either. Usually, Bayes estimates are replaced with maximum-a- 
posteriori (MAP) ones. The resulting optimization problem turns often to be non-convex, and sophisticated 
techniques, like simulated annealing, are needed to tackle them. This is the strategy adopted by Mukherjee 
et al in 11211 . following the pioneering work of Geman and Geman ll22l . Alternatively, whenever appro- 
priate discretization is possible, loopy belief propagation ll23l ll24l could be employed. 

It is the purpose of this paper to develop a scheme for demosaicing that can incorporate realistic, 
learned prior knowledge, yet is amenable to efficient computation. The basic tenets of our approach are 
the following: 

realizing that color is constant over large patches, we regard it as a classical parameter. This allows 
us to recast color demosaicing as a blind linear inverse problem, with color parameterizing the unknown 
kernel, and brightness taking on the role of a latent variable. Our aim is to learn the former in an 
unsupervised manner, then infer the latter from the data. Such job will be carried out by an expectation- 
maximization algorithm ll25l . The main obstacle we face is, of course, figuring out a suitable regularizer. 
In this respect we make our second basic assumption, namely, color is modeled as a pairwise Markov 
random field whose neighborhood system encompasses all non-local color correlations as well as those 
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between color and brightness. As we will see, this basically amounts to claiming that the size of the local 
region over which color is expected to remain constant, and its shape, can be predicted from the local 
conditions of illumination and the color of the surrounding background. A regression model for such 
functional dependence is to be learned in advance in a supervised manner. In this respect, our scheme 
may be seen as providing a context for efforts initiated in ||26l E7ll . where neural networks were used 
to learn demosaicing interpolators as regressors. 

For the sake of clarity we will present our ideas in the context of the simplified toy problem depicted 
in Fig [T] iSJ : a ID array of pixels each of whom carries only two degrees of freedom, say red and green, 
instead of the usual three separations. They are represented by {(v X j,v y> j),j G Z}, the projection of 
vectors {vj,j € Z} on each of the two coordinate axes. The ratio between them, as measured by angles 
{Oj,j £ Z}, represents chrominance in this simplified world; brightness is represented by the modulus 
{lj = \vj\,j G Z}. Both projections are non-negative, and so Oj € [0, n/2] , Vj. At even numbered pixels, 
only v x (red) is observed; at odd ones, only v y (green). All observations are contaminated by white 
Gaussian noise. We are asked to reconstruct a full vector v at each position. 




Fig. 1. Toy problem. A ID array of pixels, each of whom carries two degrees of freedom, (v x , v y ). They are jointly represented 
by vector v. Color is represented by 8; I = |v| represents brightness. At even positions, j — 2k, only v x is observed; at odd 
ones, j = 2k + 1, only v y 

The paper is organized as follows: in section II we lay down the framework in which our ideas are 
implemented. We do it in the context of the aforementioned toy problem. In section III we turn back 
to our original demosaicing task, and touch on some issues left behind in the previous discussion. In 
section IV we present the results of a comparative study on the performances of a well established, state- 
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of-the-art demosaicing algorithm, and of ours. We close with a summary of our findings and directions 
for future work. 



Let's recall our toy problem, and introduce some notation: a ID array of pixels each of whom carries 
two degrees of freedom, v = {vj = (v x j, v y j), j G Z}. Our observations, y = {yj, j G Z}, comprise a 
noisy measurement of the red separation at even numbered pixels, and of the green one, at odd ones: 



We are asked to reconstruct a full vector v at every position. Instead of cartesian coordinates, we'll be 
using a polar representation of the pair of degrees of freedom (v x ,v y ): v x j = Ij cos 9j, and v y j = Ij sin 9j. 
Angles, 9 = {9j,j G Z}, as labels of the set of points whose coordinates share a common ratio, are 
proxies for color. Modules, 1 = {lj,j G Z}, correspond to brightness. To keep the notation as neat 
as possible, we will use the name of a magnitude with no site label to mean the set of all values of 
that magnitude in the chain. For instance, 9,\ will stand, respectively, for the sets 9 = {6j,j G Z} and 



Our problem is an ill-posed one, but the following simple observation will shed some light about how 
we main attempt to regularize it. Let's assume for a moment that our measurements are noise-free, and 
that color, though unknown, is constant, i.e. 9j = 9o,\/j. Under such circumstances it would suffice to 
require that brightness has no structure whatsoever in the highest frequency mode to turn our problem 
into a well-posed one. Let's see how 



II. An Expectation-Maximization estimation of color 




for j = 2k + 1 



for j = 2k 



(1) 



l = {lj,jeZ}. 



hk+i — o 



(2) 



j k 



k 



where 1 stands for the discrete Fourier transform of 1. In the absence of noise 




for j = 2k + 1 



for j = 2k 



(3) 



and so, we would conclude 



T,kV2k 

Estimates for v are immediately read off from Q and Q. 



<? = arctan ^^±1 



(4) 



February 12, 2010 



DRAFT 



6 



We now proceed to drop the strong assumptions we have just made. For the sake of clarity we will 
do it in steps: 

1. First we drop our requirement that measurements be noiseless, but still assume that color is constant. 
We replace ([2]) with a realistic prior for brightness, and Q with a maximum likelihood estimate. 

2. Next, we relax the assumption that color is constant, and demand from it to be just piece-wise 
constant. Ideas from robust statistics will be incorporated into a pairwise Markov random field to 
model this behavior. For the time being the graph structure of the Markov field will be assumed 
known. 

3. We enrich our color model by letting the graph structure of the Markov field become itself an 
unobserved variable. As it turns out, what we need is a regression model for the strength of the 
links between neighbors as a function of the background's color. This is to be learned off-line in a 
supervised manner. 

4. At last we turn our attention to color-brightness statistical dependencies. As it happens, they can 
be captured by just letting the regression model mentioned above be a function of both color and 
brightness. 

A. Constant color 

Assuming color is constant, we rewrite ([I]) as 

y = V(9 ) 1 + e (5) 

where V(9q) = diag(. . . , cosOq, siuOq, . . .) is a diagonal square matrix with entries cos 6q a t even rows, 
and sin#o at odd ones. Noise is assumed to be stationary, white and gaussian, e ~ (0; ct 2 I). General 
covariance matrices could be easily accommodated in our formalism, but these considerations are not 
within the scope of our interest. Anyhow, we do need to have the photosensors behavior characterized 
beforehand, so that the covariance matrix of measurements is known in advance. 

Eq. Q tries to emphasize one important insight into the problem of color demosaicing, namely, that, 
because color is almost constant over large patches, we may expect its posterior distribution given the data 
to be strongly peaked around its mode, and hence regard it as a classical parameter. Optimal estimates 
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for the magnitudes of interest can then be approximated as follows 



J x,3 



(6) 
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where 



arg max p(0o\y) 



(7) 



gA/L 



To proceed we need to make explicit our 



So, we are tasked with computing 6q , and E l|y, 6q 
model for the joint distribution of the relevant magnitudes. In making this choice we strive to keep a 
balance between computational tractability and fidelity. At this stage, we content ourselves with modeling 
color and brightness as marginally independent. 

p(y,l,0 o ) = -p(l) p(yMo) 
Likelihood and brightness priors are respectively given by 

p(y|l,0 o ) = M(y,V(e )l, a 2 l) 

p{\) = AA(1;0, S) (8) 

As usual, M(-) stands for a normal distribution of its first argument with average and covariance matrix 
specified in the second and third ones. 



([8]) captures only linear correlations. Extensive literature has been devoted to the characterization of 
the higher-order statistics of natural gray scale images (see for instance |[T8l ). and it seems only befitting 
to incorporate those ideas into p{\). Doing so would certainly yield quality gains to our demosaicing, 
particularly at high ISOs and low signal-to-noise ratios. In this paper, however, we content ourselves with 
pointing the issue out, but will not pursue it any further. A second comment regarding model ([8]) is in 
place here: it has been pointed out that natural gray scale images are invariant under scale transforms, 
as well as stationary and isotropic [28]. These symmetry requirements impose a power law on its power 
spectrum ifTTl 

FET' 1 = diag(e \lo\- u ) 
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T stands for the discrete Fourier transform, and u> is a frequency label. 

All the ingredients are now in place, and we proceed to compute 9q IL . Expectation-Maximization ll25l 
carries out this program by iteratively optimizing a subrogate function 



qML 
1 



lim 9, 

t— too 



(f) 





9 { q +1) = orgmaxgLip-^) 



with L{9;9 ( t] ) defined as 



In our problem, L(9; 9^) takes a simple form 



l(mS°) 



logp(y,l,9)\y,9 { 



a 2 L(9;9 t 



where Lq doesn't depend on 9, and 



Lq + P e cos 9 + P Q sin ( 



1 



{A 2 cos 2 9 + A 2 sin 2 9} 



Pe(y,9f) 
Po(y,9f) 



E^ 2 * E hk\y,9Q } 

k 



= E E [4+ily^, 



A^(y,^) 

EM then comes down to iteratively repeating two steps: 
• Expectation 



Compute 



Maximization 



g(*+l) 



P P A 2 A 2 

-<e; r oi L -*o 



arg max s {—-(A 2 cos 2 9 + A 2 sin 2 i 



+P e cos 9 + P sin 9} 



(9) 



Kalman filter [29] provides an efficient way of updating conditional expectations, as new observations 
become available, in linear-gaussian dynamical models. Despite our problem not fitting, nominally, into 
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this category, we can still make use of the algorithm: ([8]) plays the role of an initial distribution; as we 
move along the ID array, with trivial dynamics, the data observed at the new site becomes available, 
and the posterior distribution p(l\y, 0$ ) is updated accordingly. 

It may be easily verified that L(0;9q ) is a quasi-concave function. A binary search suffices to find 
its global maximum with any desired accuracy. 



B. Piece-wise constant color 

So far we have assumed that color is constant across the system. Of course, this is an oversimplification, 
and our next step will be to relax it: we will now demand only that color be piece-wise constant. 

We recall the way we have recast color demosaicing as a blind linear inverse problem: 

y = V{6)\ + e (10) 

A learning algorithm for the loading matrix V = diag(. . . , cos02k, sin^+i, • • •) must now be capable 
of automatically partitioning the ID array into patches of constant color. Of course, neither the number 
of patches nor their positions and sizes are known beforehand. To achieve this goal we need a suitable 
regularizer. Borrowing ideas from robust statistics, we model color as a pairwise Markov random field 

p(W) = i II e-A* I***-*')! (11) 
<tj>es 

S stands for the neighborhood system of the Markov field, and B = {/3ij,i,j G Z} quantify the strength 
of the link between pairs of neighbors. For the time being we assume that both S and /3 are known. 
Z{f3) is a normalization constant usually referred to as the partition function in the statistical physics 



literature. To get some insight into (111 notice that 



\sin(0i — 0j)\ = |uj x Uj\ (12) 

where uj , j € Z are unitary vectors parallel to \j at each site. The cross-product promotes configurations 
in which such vectors are aligned, while the laplacian distribution of fluctuations promotes isolated 
discontinuities [301. 
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Putting together the gaussian model for brightness, ([8]), and ( 11 ), we end up with the following joint 
distribution 

P (y,l,9) = p(y\l,9) p(l) p(9) 

where 

p(y\l,0) = M(y,V(0)l, a 2 I) (13) 

Pay attention that we are still assuming that color and brightness are marginally independent. In due 
course we will drop such assumption. 

To learn color we again resort to expectation-maximization. It now comes down to iteratively optimizing 
the following function 

jeZ jeZ 
~ °" 2 Ai|ui x Uj \ (14) 

<i,j>eS 



with 



and 



cos 9j , for j = 2k 
sin 9j , for j = 2k + 1 

Pj^E^-ly.flM] (15) 

'i]\y,0 {t) ' 



To validate our method we have carried out some simulations. The study was conducted as follows: 
we first extracted a linear profile along 256 pixel sites from a textured region of a gray scale image. This 
profile served as the brightness signal we'll later aim to reconstruct, {lj,j = 1, 2, . . . , 256}. A textured 
region was selected so that high-frequency structure is present in the profile. Next, we partitioned the 
linear array into several patches of different lengths, and assigned a constant color to each one. Then 
cropped the red and green projections in alternate sites, and synthetically added white, additive, zero- 
mean Gaussian noise. Noise levels were chosen to match realistic signal-to-noise ratios for ISOs up to 
ISO 800. Fig. ((2) illustrates the results of the study. 
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For this study we used a uniform regularizer with only nearest neighbors interactions, i.e. /Jy = (3q > 
for \i — j\ = 1, and = 0, otherwise. The value of @q was handpicked. As the graph of this Markov 
chain contains no loops, a global optimizer of a discrete version of ( [14] ) can be found using Viterbi's 



algorithm [23 1 . Regarding the expectation step (15 1, no changes are needed to cope with the new color 



model. As we did in the previous section, we again compute it using Kalman filter. 

As the plots show, the simple regularizer used in the simulations suffices to learn models of piece-wise 
constant color. As it turn out, it's not rich enough to successfully demosaic color images. We tackle this 
issue next. 
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Fig. 2. Reconstruction of color-brightness in a ID toy problem where color is piecewise constant. The plots on the left 

compare the reconstructed color (continuous red line) to its original value (green crosses). The plots on the right compare 

reconstructed brightness (continuous red line) to its original value (green crosses). Two noise levels have been simulated, 
namely, a — 30 (top), 50 (bottom). Both brightness and noise level are measured in arbitrary units. 
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C. An adaptive color model 

In the previous section we hypothesized that the neighborhood of each pixel, as well as the strength 
of the links to its neighbors, {/%, < ij > 6 S}, were known. In this section and the next we show how 
learned graph structures are plugged into the framework: 



In the Bayesian spirit we accept uncertainty in the graph structure of the pairwise Markov field by 
letting P = {fyj, i,j G Z} be themselves random variables^ The joint distribution now factorizes as 



p(y,i,e,P) = p(y\i,e)p(i)p(9\i3)p(p) 



(16) 



with jo(y|l, 9), p(l) given respectively by (13 1, ([8]), and p(0\P) given by (111. Fortunately, we don't need 
to make any explicit choice for p(P), neither compute the partition function Z(J$). Let's see why: 



Expectation-Maximization learns color by repeatedly optimizing a subrogate function that, given ( 16 1, 
( fTT) , looks like 

L(0;0 (t) ) = A,+E[io 5 p(yM)|y,0 (t) ' 

- E[A;|y,0 (i) ] | s in(0i-^)l 

<ij>es 



(17) 



where the only change with respect to ( 14 1 is that we have made the replacement 



Under model (16), this conditional expectation happens to be independent of the data, y 



/%|y,0 (t) 



d\dpp ijP {\,p\y,eV) 



jdl pjy^e^pjl) fdPP ijP (8V,P) 

jdi p(y\i,eM)p(i)fdp P (ew,p) 
dpp ijP (p\eM) 

~Pii\eV 



3 to be fully general, we should let the MRF include cliques involving more than just two lattice sites. S would then designate 
the set of all allowed cliques, and /3 = {/3c, C € S}. Appropriate potentials for the larger cliques, instead of ( |12| , would then 
have to be designed or learned too 
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Two important points need to be stressed here: 

a) we just need the expected value of the posterior distribution p((3\ 6), and not the distribution 
function itself 

b) being independent of the data, a regression model for this conditional expectation may be learned 
in advance 



D. Color-Brightness cross-dependencies 



So far (see (16)), we have assumed that color and brightness are marginally independent. This might 
seem a reasonable assumption given that brightness often changes at a fast pace in regions where color 
remains constant. Actually, it is not: steep changes in color are always accompanied by steep changes in 
brightness, and failure to have these boundaries registered results in highly visible artifacts. 



We may still capture color-brightness dependencies within a workable learning scheme if we just 
assume that they are fully encompassed in the graph structure of the image model, i.e. 

p{\,e\P)=pmp(p\p) 

The joint distribution then factorizes as 

p(y,l,6,P) = p(yM)p(l|/3) (18) 

x p(9\(3)p((3) 

What is left is to derive a learning algorithm for this model. This is done in the Appendix. Here we just 
summarize our findings: the entanglement between color and brightness poses a challenge. Fortunately, 



within a Laplace approximation, ( 14 1 remains valid as long as we replace {/%} with a regression function 
that now depends on both color and brightness 

fa — ►& i (l (t) ,0 W ) =E[/3 ij \l®,0®] (19) 
where 1W = E* l|y, 9^ ; E* stands for the conditional expectation with respect to the gaussian process 

p(yM)pQ). 



The regression function (19 1 enriches the one we introduced in the previous section. Basically, the 



claim we are making here is that, under model ( 18 1, the size and shape of the region around a pixel site, 
say i, where color is expected to remain constant, Si = {j G Z : > 0}, can be predicted from the 
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color of the surroundings and the intensity of the illumination there. Such predictive rule is to be learned 
in advance in a supervised manner. 



This completes our learning-inference framework for color demosaicing. The following pseudo-code 
summarizes the scheme 

0. Learn off-line the regression function (3{j(l,9) 

To demosaic an image, repeat iteratively the following steps 

1. Expectation: given the current estimate of color, infer brightness 



1(0 



i|y,0 (t) 



as well as the conditional variance of each of its components (as required in ( 14 1). 
2. Evaluate /3y (](*), a t the current estimates of color and brightness. Plug it into (14i. 



3. Maximization: update the estimate of color by optimizing ( 14 1 



III. Demosaicing color images 

In the previous section we presented our framework for color demosaicing in the context of the toy 
problem depicted in Fig|T] Demosaicing actual images requires some adjustments. In this section we 
review them: 

a. Whereas our toy problem comprised just two color separations, and a single angle, 9, was enough to 
specify their ratio at each site, color images comprise three separations. Color will now be represented 
by a 3D unitary vector taking values on the positive octant of the sphere, or, equivalently, by two 
angles, [0, vr/2]. Entries in the diagonal loading matrix T>(9, <fi) = diag(. . . , hj(9j, <f>j), . . .) 

need to be replaced with 

cos 9j , for j € Q 
hj{9j,4>j) = ^ sin9j cos </>j , for j 6 1Z 

for j G B 

where Q,lZ,BcZ 2 stand for the sublattices of the color filter array where green, red and blue are, 



respectively, measured; and ( 14 1 is replaced with 

jeZ 2 jeZ 2 

- £ a 2 /^l« 0« 0«)|u, x Ui | 

<i,j>es 



(20) 
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with Pj = Vj E lj\y,6W, , A] = E Z||y, 0^ , and u = (cos 0, sin cos 4>, sin sin ^>). 
b. Pixel sites in our toy problem were arranged in a ID chain. Images, on the other hand, are 2D. On 



the surface this seems to have required a very minor adjustment (compare ( |20| ) to (14i). However, 
the larger system size is going to force upon us some changes at the implementation level. We 
review them next: 

Expectation Step: In our simulations of the previous section, we computed ( [T5| ) by means of 
Kalman filter. The memory requirements of this algorithm, however, grow quadratically with 
the number of lattice sites, and become untenable for our ultimate purposes. A more efficient 
alternative is provided by low-memory quasi-Newton optimizers [31]. We exploit the fact that 
the average and the mode of a gaussian distribution coincide, and that its covariance matrix 
happens to be the inverse of the hessian of its logarithm. Low-memory quasi-Newton optimizers 
provide efficient estimates for both of them. 

Maximization Step: In our simulations of the previous section, we resorted to Viterbfs algo- 
rithm to find, with any desired accuracy, the global optimum of ( [T4] ). This was possible thanks 
to our readiness there to model color as a MRF with no loops in its graph (remember, we chose 
(3ij = for \i— j\ ^ 1). When demosaicing color images we cannot afford this simplification, and 
must deal with MRF whose graphs do contain loops. For them, belief propagation algorithms, 
like Viterbfs, cannot guarantee convergence to a global optima any longer. This is an important 
distinction, though only at a formal level; in practice, for the problem at hand and for many others 
|[32l . it happens to be of little relevance. There is, however, another obstacle that do prevent us 
from using loopy belief propagation, namely, the degrees of freedom we aim to estimate, 9 and 
<j>, take values in the continuum, instead of a discrete set. In the previous section, we bypassed 
this difficulty by just discretizing the interval [0,7r/2] finely enough. With color images, this is 
simply not viable. Sophisticated extensions, like non-parametric belief propagation ll33l . could 
still be considered. Fortunately, we don't need them, as the simplest of the approaches suffices 
to do the job: in the spirit of generalized EM 041 . we sequentially solve univariate optimization 
problems on each of the degrees of freedom involved. 

c. To initialize our whole procedure we have chosen, for its simplicity, Laroche's gradient-based 
interpolation [8]. More sophisticated interpolations, like those described in (9] and Chang: 1999, 
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may certainly yield improved results. Such considerations are, however, beyond the scope of our 
current interest. 

IV. Results 

Our primary goal so far has been the development of a framework for color demosaicing that may 
enable the solution of this problem in a systematic manner. We presented such framework in the 
preceeding sections. The main conclusion to be drawn from the ideas exposed there is the following: 
color demosaicing is possible inasmuch as a regressor /3(1, 9, <p) can be learned, i.e as long as the size and 
shape of the local region where color is expected to remain constant can be predicted from the color of the 
surroundings and the conditions of illumination there. Our scheme then provides a principled prescription 
for the integration of that knowledge into an adaptive filter similar in spirit to those proposed by Lukac 
et al 1351 . 

What is still missing for our framework to be complete is a recipe for learning /3(1, 9, (/>). The need 
for such procedure is made more acute by the realization that no single regression model might be 
suitable for all sensors and kinds of illuminants, irrespective of the spectral response of the former and 
the spectral content of the latter. Ideally then, whenever any of these two changes, /3(l, 9, (j)) should be 
learned anew, so that it trully captures the joint statistics of the three color separations that are actually 
presented to the demosaicing algorithm. Efforts on this front are underway and will be reported elsewhere. 

This said, it is evident that only empirical evidence can bring true plausibility to our approach. To 
provide it, we have crafted a regression model, with the help of feedback from human experts, and used 
it in a comparative study of performance with a well-known, state-of-the-art demosaicing algorithm. For 
the sake of completeness, we review the most remarkable features of this model of ours. 

Ml-M) = ^e-^-^ 2 / d2 ^Hl + ae-^/ R ) (21) 

a and R are constants; |r, — is the euclidean distance between lattice sites i and j; d(l,9,(p) sets 
the scale of the local region around each pixel where no color modulations are expected to occur. Its 
complex functional dependece on its arguments, {1, 9, (j)}, we have encoded in a regression tree [36] that 

implicitly defined as the outcome, at each iteration, of our expectation-maximization algorithm 
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we learned as follows: 

we first assembled a collection of patches of uniform color, and demosaiced them using various different 
values for the scale d. With the help of human experts, one of those values was chosen as optimal for 
the patch, and recorded. In addition to this scale, we also recorded, for selected pixels in the patch and 
for windows of increasing size around them, the maximum and minimum values of the green channel, 
and the red-to-green and blue-to-green differences. The latter are meant to serve as prediction attributes 
for d. Our regression tree was then learned from all this data. One issue must be stressed: the prediction 
attributes fed to the learning algorithm were collected from demosaiced patches. At no point whatsoever 
were the original color pictures, even if available, shown to it. Some remarkable aspects of our learned 
predictor are worth mentioning 

1. Non-locality: Windows of up to 20 x 20 pixels are needed to accurately predict d, even though d 
itself takes on much smaller values, typically in the interval ~ (0.25,3) pixels. 

2. Color Saturation: On backgrounds of saturated colors, d takes its smallest values, ~ 0.25 pixels; on 
non-saturated backgrounds, (where the three color separations take almost equal values), its largest, 
~ 3. pixels. In other words, color modulations are effectively quenched on backgrounds of the latter 
kind. This, in turn, should imply that the photosensor array, and possibly our visual system too, is 
most sensitive to brightness modulations that occur on top of backgrounds of such color^] 

3. Brightness Saturation On very bright regions, d takes large values. This is consistent with the 
behaviour of devices, like monitors, that saturate as brightness surpass certain threshold. In such 
conditions thay are rendered unable to reproduce color modulations. 

4. Color/Brightness Edges. Sharp changes in color are almost systematically accompanied by sharp 
changes in brightness, or, equivalently, an absence of changes in brightness is likely to imply that 
color remains constant. This behaviour is captured by the second factor in ( [21) . It is this term that 
confers shape to the local region around a pixel where color constancy holds. 

Our comparative study of performance follows on the footsteps of that carried out in ifTTI : 20 pictures 
were selected from the PhotoCD PCD0992 set of high quality color images 11371 . These are scanned 
analogical pictures, for which the true values of the three color separations are known. Data was cropped 
from them as if seen by a Bayer color filter array, and fed into our algorithm for demosaicing. To 
serve as references the same data was also fed to Li's successive approximation [38], SA. As in |[TT1l . 

5 for that matter, this might be the reason why this paper is written on a white sheet, instead of a vivid red one 
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two quantitative measures of performance are provided: PSNR and SCIELAB [39]. Neither of these two 
metrics, however, can trully measure how perceptually pleasant a demosaiced image is to a human viewer 
fl40l . For this reason, as it is customary, we report as well subjective assesments of performance. 

Tables [I] and [TTJ and Figure [3] illustrate the results of our study, whose conclusions we summarize as 
follows: SA consistenly scores higher than our algorithm both in SCIELAB and PSNR. On the other 
hand, our demosaiced images are free from the very evident color artifacts that plague SA reconstructions. 

For a better appreciation of the possibilities of the ideas described in this paper, we provide in Figj4] 
examples of images demosaiced with our algorithm from digitally captured data. To deal with this kind 
of data, a new regression tree for d(l, 9, 4>) needed to be learned. 



TABLE I 

SCIELAB for images demosaiced from data known with a precision of 8 bits. 



kodim 


01 


02 


03 


04 


05 


06 


07 


08 


09 


10 


Ours 


1.11 


1.01 


0.64 


0.64 


1.31 


0.94 


0.49 


1.21 


0.61 


0.57 


SA 


0.99 


0.73 


0.60 


0.64 


1.32 


0.89 


0.48 


1.23 


0.63 


0.56 




kodim 


11 


12 


15 


16 


19 


20 


21 


22 


23 


24 


Ours 


0.98 


0.47 


0.78 


0.56 


0.98 


0.63 


1.04 


0.98 


0.47 


1.02 


SA 


0.86 


0.49 


0.77 


0.54 


0.90 


0.62 


0.92 


1.00 


0.52 


1.03 



TABLE II 

PSNR for images demosaiced from data known with a precision of 8 bits. 



kodim 


01 


02 


03 


04 


05 


06 


07 


08 


09 


10 


Ours 


64.57 


64.52 


64.93 


66.00 


64.16 


64.65 


66.20 


64.67 


66.26 


66.56 


SA 


66.96 


66.61 


67.50 


67.67 


65.70 


66.66 


68.57 


65.72 


68.18 


68.12 
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11 


12 


15 


16 


19 


20 


21 


22 


23 


24 


Ours 


64.80 


67.21 


64.76 


65.25 


65.99 


65.88 


64.80 


65.07 


66.23 


64.35 


SA 


67.08 


68.57 


66.24 


67.99 


67.15 


67.07 


66.97 


66.36 


67.48 


64.84 
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Fig. 3. Illustrative examples of pictures demosaiced using the algorithm described in this paper (left). For comparisson, pictures 
demosaiced from the same data by succesive approximation (SA) are also shown (right). 

V. Conclusions 

We have presented a statistical learning framework for color demosaicing. The problem was recast as a 
blind linear inverse one: color parameterizes the unknown kernel; brightness takes on the role of a latent 
variable. An expectation-maximization algorithm was designed to learn the former in an unsupervised 
manner. The latter is then inferred from the data. 

Our learning algorithm is based on a pairwise Markov random field representation of color similarities. 
Non-local color correlations, as well as those between color and brightness, are absorbed into the graph 
structure of the Markov field. Letting this structure change as a function of the background's color and 
illumination makes the whole procedure adaptive. 
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Fig. 4. Illustrative examples of pictures demosaiced from digitally captured data by the algorithm described in this paper. To 
avoid aliasing artifacts when viewing the pictures in a monitor, please enlarge the images 



The aforementioned adaptive behaviour needs to be learned in advance. Even more, we claim demo- 
saicing is possible as long as such adaptive bahaviour can be learned. Our framework then readily accepts 
such prior knowledge in a plug-and-play fashion, through the regression function /3(1, 9, (j)). 

We have proposed efficient algorithms to carry out all but one of the subtasks in our expectation- 
maximization scheme. The missing piece is a procedure for learning, in an systematic manner, /3(1, 9, (j)). 
Progress on that track will be reported elsewhere. 



Appendix 

In this appendix we present a full derivation of the subrogate function L(9; 9^>) associated to model 



( 18 1. stands for the current estimate of color after t iterations 

L(M W ) = E[logp(y,l,t?,/3)|y,0M 



L + E 



log p(y|l, 0)^,0(0 



+E 



logp(0|/3)|y,0«) 
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We start with the second contribution to L(9; 0w) . Given (111, it reads 

E\logp(0\P)\y,9^ = 



<ij>e5 



Pij\y,6®] |sin(^-^)| 



In order to bring E Pij\y,0® into a desirable expression, we make the following approximation: as a 
function of 1, p(y|l, 0(*)) p(l) is a gaussian centered at 

/dllp(y|l,^))p(l) 



where E* 



Jdl p(y\l,e(t)) P Q.) 

lj\y, 0® stands for a conditional expectation with respect to the gaussian process p(y|l, 9^') p(l). 
Under the assumption that p(/3|l) varies slowly as a function of 1 (relative to the spread of p(y|l, 9®) p(l)), 
we approximate E Pij\y,6® 



as follows 



/%|y,# w 



dpp ijP (p\y,9^) 



Jdld(3 /%p(y,l,flW/3) 

JdLd/3p(y t l,e®,P) 
Jdldp ^ P (y\l,0( t )) P (8( t )\P)p(P\l)p(l) 

fdldPp(y\l,9®)p(9W\p)p(l3\i)pW 
J dld/3 fa p(y|l, gW) p(8 (t) \/3) p(0V) p(l) 
/dld/3p(y|l,0(*))p(0(*)|^)p(/3|K*))p(l) 

and, invoking (18 1, we complete our derivation 

/dgflg p(6^\f3) p(f3\\V) /dlp(y|l,gW)p(l) 
/rf/3p(^)|/3)p(/3|lW) /dlp(y|l,0(*>)p(l) 

JdPfrj P (e®\l3)p(lV\l3)p(l3) 

SdPp(9W\f})p(\V\P)p{P) 
JdPfrj p(lW,gW|/3)p(/3) 

SdppQ.<. t ),en\p)p(j3) 

dPfcj p(/3|l (t) ,0 (i) ) 
A*|l W ,* (t) 

The second of the two contributions reads 



E[logp(y|l,0)|y,0(*> 
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E 



logN(y;V(8)l,a 2 I)\y,eV 

2 ^E E N-^M 2 ly» e(t) 



with 



COS I 



for j = 2/c 



sin Oj , for j = 2k + 1 



For linear-gaussian processes, E 
we proceed as follows 



UJ 2 AyM t] 



can be efficiently computed. To deal with process (18 1 



Jdld/3 p(y,l,0(t),f3) 
fdld(3 ; jP (y|l,^))p(gW|/3)p(/3|l)p(l) 
fdMPp(y\l,OM)p(e®\0)p(J3\l)p(l) 

and making use of the same Laplaciarj^] approximation as above, we find 

fdup i jP {y\\,e^)p{e^)p{p\\^) P (\) 

Sd\df}p{y\\,6V)p{em3)p{P\\V)p{\) 
J dP p(QV\P) p(/3|lW) JdU,p(y\l,e^)p(l) 

f d(3 p(9(t)\f3) fdLp(y\l,OW)p(l) 
Jdl / jP (y|l,gW)p(l) 

Jdlp(y\l,0(t))p(l) 



E* 



Similarly, we have 



h\y,o (t) 



ll\y,0 {t) 
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6 we are keeping only linear correlations in the posterior distribution of 1, but strictly speaking this is not a Laplace 
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